Minimum spanning trees and random resistor networks in d dimensions 
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We consider minimum-cost spanning trees, both in lattice and Euclidean models, in d dimensions. 
For the cost of the optimum tree in a box of size L, we show that there is a correction of order L e , 
where 6 < is a universal d-dependent exponent. There is a similar form for the change in optimum 
cost under a change in boundary condition. At non-zero temperature T, there is a crossover length 
£ T~" , such that on length scales larger than f, the behavior becomes that of uniform spanning 
trees. There is a scaling relation 9 — and we provide several arguments that show that v 

and —1/6 both equal ^ pcrc , the correlation length exponent for ordinary percolation in the same 
dimension d, in all dimensions d > 1. The arguments all rely on the close relation of Kruskal's greedy 
algorithm for the minimum spanning tree, percolation, and (for some arguments) random resistor 
networks. The scaling of the entropy and free energy at small non-zero T, and hence of the number 
of near-optimal solutions, is also discussed. We suggest that the Steiner tree problem is in the 
same universality class as the minimum spanning tree in all dimensions, as is the traveling salesman 
problem in two dimensions. Hence all will have the same value of = —3/4 in two dimensions. 



I. INTRODUCTION 

Minimum spanning trees are a problem of combinato- 
rial optimization [1,2]. Suppose we are given an undi- 
rected connected graph G, with vertex set V and edge 
set E, and a cost (or weight, or "length") assigned to 
each edge (ij) £ E (where i, j S V). The problem is to 
find a spanning tree T (i.e. a connected subgraph of G 
that includes all vertices in V, but whose edges form no 
cycles; such a tree must have \V\ — 1 edges), such that 
the total cost of the edges in T, 



(1) 



is as small as possible. Thus the minimization is over the 
set T of spanning trees in G. 

In this paper we are interested in the case in which G is 
a simply-connected portion A of a regular lattice in d > 1 
dimensions (with edges connecting nearest-neighbor lat- 
tice vertices only; the nearest-neighbor distance is fixed 
at 1 throughout this paper), including the case when A 
tends to the entire lattice, and the edge costs are in- 
dependent, identically-distributed random variables, for 
example t^j uniformly distributed on [0, 1]. We will also 
consider geometries with periodic boundary conditions, 
in which A has no boundary. The results also apply with- 
out significant modification to cases with other distribu- 
tions, and/or with short-range correlations of the t%jS, 
and to the Euclidean minimum spanning tree, in which 
N = |V| points are distributed independently and uni- 
formly (with density 1) in a portion A of <i-dimensional 
Euclidean space, and the cost of an edge (ij) is the Eu- 
clidean distance between i and j, for any pair i =/= j. 

The motivation for this work is to understand disor- 
dered systems at low temperatures better, beginning with 
those in which quantum-mechanical effects are negligi- 
ble. Here "disordered" means that the Hamiltonian (or 



energy as a function of the system configuration) con- 
tains random variables, and the minimum energy must 
be found for fixed (or "quenched" ) values of these random 
variables. Such systems include classical Ising spin glass 
models. There is a great deal of overlap between this 
field and that of random optimization, including some 
common models [3] . There is even a strongly-disordered 
spin-glass model that maps onto minimum spanning trees 
[4] . The results in this paper can be considered as a rare 
case in which some exact results (or exact mapping to an- 
other problem) can be found for a fairly natural system 
with quenched disorder. 

The questions of interest here include the dependence 
of the total cost of the minimum spanning tree (MST) 
on the size of the system A, and on certain changes of 
boundary conditions to be defined below. The expecta- 
tion value of the cost ^opt of the MST is expected to 
take the form (over lines denote the average over all £y) 



(2) 



i=0 



asymptotically as the size of A — > oo, keeping the shape 
fixed [5] . Here Pi are non-universal constants (the values 
of which will change if the 4jS are correlated, or for the 
Euclidean problem) , and Vd t are di = d — i-dimensional 
volumes of A and its boundary. That is, Vd — \V\ is the 
d-dimensional volume of A, Vd-i is the d— 1-dimensional 
"area" of the boundary, Vd-2 is the d — 2-dimensional 
"length" of the edges of the boundary, . . . , down to Vo, 
the number of zero-dimensional corners of A. /3q = j3 has 
been extensively studied (see e.g. Ref. [2] for a review), 
while bounds on f3\ have been established in d = 2 (Ref. 
[6] for the Euclidean case). The most interesting part is 
the subsequent terms Zg n , the leading corrections to the 
bulk part of the cost in a finite-size system. These are 
shape dependent, and may be difficult to separate from 
the term (3d Vo, since as we will see Zg n can be of order 
1 for the MST. Here for simplicity we will take A in the 
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form of a hypercube of side L, with periodic boundary 
conditions (so all with i > are zero). Then we find 
as L — > oo [5] 



4n 



(3) 



Here £ c is the (non-universal) value of the cost of an edge 
at the percolation threshold, that is the stage in Kruskal's 
greedy algorithm [7,1] at which the growing trees perco- 
late across the system, for L — > oo; in the above model 
of £ij uniformly distributed in [0, 1], £ c = p c , the thresh- 
old for bond percolation. Also, A' is a d-dependent non- 
universal constant. We will argue that (i) 9 is universal 
(but depends on d), (ii) 9 < for all d, and (iii) in fact 



pcrc? 



(4) 



where v pfirc is the correlation length exponent for classical 
percolation in d dimensions. It is known that v peic = 1 
(d = 1), 4/3 (d = 2), and v pcrc = 1/2 for d > d c , where 
d c is a critical dimension, d c = 6 for percolation; there 
are approximate values for f p0 rc for other intermediate d. 

We also consider the effect of a change in boundary 
conditions. We can study the mean change in optimum 
cost produced when a constraint, that the tree must pos- 
sess at least k distinct branches that cross between two 
ends of the system, for example between the ends of a 
cylinder of length L (in one direction) and width W (in 
d — 1 directions), is imposed. We argue that the mean 
change in cost per unit length scales as 



lim 



^OPT(fc) - ^OPT 



\' k W 



0-1 



(5) 



as W — > oo, for all dimensions d, again with 9 = 

1/^perc- 

These finite-size corrections to the mean cost, and its 
sensitivity to boundary conditions, are analogous to those 
for the ground-state energy of disordered classical sys- 
tems, such as spin glasses [8,9], and the application of 
such ideas to optimization was begun in Ref. [10]. It was 
previously argued [11] for the traveling salesman prob- 
lem that similar forms hold in d = 2 with 9 replaced by 
(and with L e in i'oPT replaced by a logarithm in some 
cases), and should also hold for MSTs. It now appears 
that the coefficient A of those terms [11] is zero, at least 
for MSTs. 

The size-dependent terms in £opt are related to the 
non-zero-temperature behavior of weighted spanning 
trees. In this, we give each spanning tree a (Boltzmann- 
Gibbs) probability proportional to e~ £ ' T , where T is the 
temperature. The probabilities are normalized by divid- 
ing by the partition function 



z = y, n e ~ lii/T - 

Ter (ij)eT 



(6) 



In the limit as T — > 0, the sum over trees is dominated 
by those with the lowest total cost I. This approach 



allows methods of equilibrium statistical mechanics to be 
applied. We argue that at a small positive temperature, 
the entropy per vertex in the limit as the size of A tends 
to infinity, s, (essentially the logarithm of the number of 
near-optimal spanning trees accessible at temperature T, 
divided by |V|) behaves as 



(7) 



as T — > 0, where tp is another universal exponent, most 
likely equal to 1 for MSTs (this has also been discussed 
in Ref. [12]). Correspondingly, (e), the change in the 
thermal (as well as £ij) average cost per vertex relative 
to the optimum, is 



le) = lim 

|V|->oo 



(I) - IPPT 

\V\ 



bT 4>+\ 



(8) 



For a large system, (e) is the thermal and £ij average of 
the notion of "fractional relative error" in optimization 
theory, within a factor of @q. Inverting these formulas im- 
plies that the logarithm of the typical number of spanning 
trees with cost within a factor 1 + e of ^opt (where "typ- 
ical" can be made precise using the Boltzmann-Gibbs 
probability), divided by |V|, is 



(9) 



as e — > 0. Note that these formulas are for the limit 
|V| — > oo before T — > 0; the arguments that suggest 
that ip = 1 also suggest that s and (e) are dominated 
by local, independent excitations, with a density of order 
l/T^, and so there is a length scale £ T - T" 1/(#) such 
that these results hold for system size L » £t- 

In addition to the cost, one may also ask about cor- 
relation properties of the trees, either at T = (i.e. 
for MSTs), or in the positive-T generalization. For ex- 
ample, one may consider the expected number of trees 
that possess k distinct branches that cross between two 
balls separated by distance r, as a function of r, and so 
define correlation exponents (see e.g. [13,14]). Another 
exponent is obtained from the Hausdorff dimension of 
the path between two given points on the (same) tree. 
These universal exponents serve to distinguish universal- 
ity classes. One may ask whether the exponents for the 
statistics of the MSTs are the same as for uniform span- 
ning trees. Uniform spanning trees (USTs) arise if we set 



all L 



0, or put T — oo, in the positive-temperature 



weighted spanning trees. Thus, every spanning tree has 
equal ( "uniform" ) Boltzmann-Gibbs probability. We will 
argue the following: nonzero temperature is a relevant 
perturbation (in the renormalization-group sense), and 
leads to a correlation or crossover length £ (£ » £p for 
d > 1), such that for correlation functions over distances 
much larger than £, the behavior of USTs is recovered, 
even if T is very small. In an infinite system, this length 
diverges as 
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£ ~ tT~ v (10) 

as T — > 0. We argue, using results from the extensively- 
studied related problem of random (classical) resistor 
networks (RRNs), which again is related to percolation, 
that v — ^pcrc = —1/9. That is, —0 is the scaling dimen- 
sion for the temperature T. 

These results then imply that if we choose a typi- 
cal spanning tree with I within about 1 + s of -^optj 
then its statistical properties on length scales larger than 
£ are those of USTs. The crossover length scale is 
£ ~ c's~ v /^ +1 \ When £ is of order the system size L, or 
on length scales smaller than £, the correlations are those 
of MSTs, which should be different from those of USTs, 
at least in high dimensions d. Arguments by Newman 
and Stein [4] show that for MSTs, for d > 8 the MST in 
any finite portion of size W of the system breaks up, as 
|A| — > oo, into of order W d ~ 8 trees of size of order W, 
each tree having Hausdorff dimension 8 (their arguments 
also used a relation with percolation). Thus 8 is a critical 
dimension for MSTs, above which the exponents men- 
tioned above take simple values, related to the Hausdorff 
dimension 8 that determines the /c-crossing exponents, 
while (by a simple extension of the arguments of NS) 
the Hausdorff dimension of the path between two points 
becomes 2, as for a Brownian walk. By contrast, USTs 
have similar behavior, but consist of trees of Hausdorff 
dimension 4 for dimension bigger than 4 [15]. However, 
a relation between the two in low dimensions, in partic- 
ular d = 2, has not been ruled out, and exists, albeit 
somewhat trivially, in d = 1. 

It is interesting that the properties of MSTs fall into 
two parts. For properties involving the costs, the critical 
dimension is argued here to be 4 = 6. On the other 
hand, the geometric correlations of the trees themselves 
exhibit a critical dimension of 8. We note that the costs 
are independent of the tree geometry in the sense that, 
given the MST, the costs of the edges used cannot be 
recovered (in the lattice models, though this can be done 
in the Euclidean case). In the absence of a field theoretic 
formulation, analogous to that for equilibrium positive- T 
critical phenomena, the presence of two distinct critical 
dimensions should not seem so surprising. 

This paper is structured as follows. Section II con- 
siders the MST problem, and its nonzero temperature 
generalization, for large systems. The main results of 
this section are the exponent for the crossover length £, 
v = Vpcrc, and the behavior of the entropy and mean cost 
(per vertex) at low temperature. In section III A, aspects 
of finite-size systems are considered, first for zero tem- 
perature (MSTs). Using finite-size scaling arguments for 
percolation, the two corrections in Zfi n are obtained. The 
change in cost produced by a change in boundary condi- 
tion on a long cylinder is considered in section IIIB. Fi- 
nally, scaling at both finite size and positive temperature 
is considered. Section IV considers other optimization 



problems, including minimum cost Stciner tree, travel- 
ing salesman, and minimum weighted matching. Some 
of these are argued to be in the same universality class 
as MSTs. 

II. MSTS, RRNS, AND PERCOLATION 

This section begins with a mapping of the general 
weighted spanning tree problem to the calculation of a 
determinant of a Laplacian matrix on G. The result- 
ing linear-algebra problem is related to other problems 
of physical interest, including RRNs. This problem is 
then solved as T — > 0, and related to Kruskal's greedy 
algorithm and to a class of corresponding percolation 
problems. At nonzero temperature, the connection with 
RRNs gives the behavior (as T — > 0) of the crossover 
length £ to uniform spanning tree behavior at large length 
scales. The entropy and mean extra cost (per vertex) 
arc considered next, and related to the number of near- 
optimal spanning trees. Finally some comments on the 
mobility edge in the lattice Laplacian are made, in the 
strong disorder regime T — > 0. 

A. Mappings between problems 

The partition function Z can be reformulated as a de- 
terminant, by the matrix-tree theorem extended to in- 
clude weights Kij = e~ lii >l T [16], 

Z = det'A, (11) 

where det' denotes the determinant of a matrix, from 
which any one row and the corresponding column have 
been deleted, and A = NKN f is defined as follows. TV is 
the incidence matrix of G viewed as a directed graph by 
adding an arrow to every edge in an arbitrary fashion; 
then for vertices i and edges e, 

{0 if i is not on e, 
1 if Us the head of e, (12) 
— 1 if i is the tail of e. 

N f denotes the transpose of N, and K is the diagonal 
\E\ x \E\ matrix with entries K(e, e) = = e~ ti il T for 
the edge e = (ij). 

The matrix A = NKN f can be regarded as a Lapla- 
cian on G. It has a zero mode, the vector (1,1,..., 1)', 
and is positive semi-definite (if all tij/T are real), as can 
be seen by writing N' = NK 1 / 2 , and A = N'N n . The 
deletion of a row and column from A before calculating 
the determinant removes the zero mode, which would 
otherwise cause the determinant to vanish. 

Now we suppose, as in the introduction, that the graph 
G is a portion A of a d-dimensional lattice, and that the 
costs are random variables. Then there are some physical 
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problems that can be associated with the mathematical 
system defined by A. For example, consider the eigen- 
value problem for the matrix A, 

Av = Xv. (13) 

This is similar to the problem of finding the cigcnfrc- 
quencies ±\/A for a collection of unit masses connected 
by springs with random spring constants Kij > (but 
with scalar rather than vector displacements), or sim- 
ilarly the spectrum of linearized magnons in a magnet 
with random exchange constants. The exact zero mode 
is associated with the spontaneous breaking of a sym- 
metry. Such problems have been studied for a long time 
(see e.g. Refs. [17-19] and Ref. [20] contains a review), al- 
though as T — > the probability distribution for we 
consider is particularly broad. The eigenvalue problem is 
considered further in the following. 

Another problem, which goes back to work by Khir- 
choff, associated with this linear system is that of a re- 
sistor network. Let I = (I e ) be the column vector of 
currents (in the direction of the arrow) along the edges 
e. In the absence of any external current sources, the net 
current into any vertex is zero, that is 

NI = 0. (14) 

If potentials fa are associated with each vertex i [forming 
a column vector <p — {fa)], then Ohm's law states that 

/ = -KN f fa (15) 

where — (Rij) -1 is the reciprocal of the resistance 
(i.e the conductance) of the edge e = (ij). Eliminating 
the currents then gives A0 = 0, which of course is solved 
by the zero mode, (p = constant. 

If one wishes to find the resistance between any two 
vertices, by connecting an external current source across 
them, then this also uses the matrix A. If a current Jj 
enters the network at each vertex i, then forming the 
column vector J = (J%), we now have 

NI = -J (16) 

so A</> = J (J^i J% — 0, otherwise there will be no solu- 
tions). Then 

= A'- 1 J (17) 

(plus an arbitrary constant), where A' denotes A re- 
stricted to the subspace orthogonal to the zero mode, 
so that 

A'" 1 = 5>(n)«(n)/An, (18) 
n^O 

where A„ , f(„) , are the eigenvalues and normalized eigen- 
vectors of A, and the zeroth eigenvalue Aq = is omitted 



from the sum. From fa the current flowing along any 
edge in the presence of arbitrary sources J can be found. 
Then the resistance between vertices i and j can easily 
be shown to be 

ii (equiv)ij = (A'~ 1 ) ii + (A'-%- - 2(A'-%, (19) 

One popular version of the random resistor network 
problem is that in which the resistors Rij on the edges 
are either a constant R, or infinity, with independent 
probabilities p, 1 — p respectively. This has an obvious 
connection with percolation [21]. In this paper we are in- 
stead interested in the case where Rij has a continuous, 
but very broad distribution, as in Ref. [22]. The specific 
form in which we are interested, because of its connec- 
tion with weighted spanning trees, is Rij = e iij / T , with 
lij random variables, and T going to zero (it arises, for 
example, if lij is the Euclidean distance between vertices 
i and j that represent localized states, T is the local- 
ization length, treated as a constant, and is one aspect 
considered in Ref. [22]). This form also has a less obvious 
connection with percolation, as we will see. Our simplest 
model, in which lij are independent and uniformly dis- 
tributed on [0,1], has been studied before [23,24,21,25]. 
The distribution of conductances on the edges is then 
P(Kij)=TK7/ for e[e~V T ,l]- 

B. Solution of eigenvalue problem as T — > 

The next step we will take is to study the eigenvalue 
problem for strong disorder, T small, first in the extreme 
limit as T — > oo for a fixed finite graph G with given 
weights 1^. In this limit, the eigenvalues and eigenvec- 
tors are determined by a simple procedure, that is related 
both to the greedy (Kruskal [7]) algorithm which solves 
the MST problem [1], and to the real-space renormal- 
ization group method for strong disorder that has been 
applied to quantum problems (from this point of view, A 
is the Hamiltonian for a one-particle hopping problem). 
Since A contains terms that vary greatly in magnitude, 
we may begin by finding the largest K^, all other terms 
being negligible compared with this (since we are inter- 
ested eventually in the random version with a continu- 
ous distribution, in which with probability one no two 
are equal, we neglect the possibility of equal Kijs). 
Let us relabel the vertices so that those connected by 
the largest K^ are 1 and 2. At this level of approxima- 
tion, the matrix breaks into a 2 x 2 block, and |V| — 2 
other lxl zero blocks. The 2x2 block has a normal- 
ized eigenvector (1, —1)*/ a/2 that has eigenvalue 2Ki2 7 
and another eigenvector (1, 1)* with eigenvalue 0. Then 
we find the next strongest Kij. This either connects two 
vertices (which can be relabeled as 3, 4) distinct from 
1 and 2, or else it connects either 1 or 2 to a vertex 3 
(we may relabel so that it is if 23). In the first case, two 
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eigenvectors of the 3-4 block can be found as for 1 and 2. 
In the second case, in the strong disorder (T — > 0) limit, 
K12 is much larger than K 2 3- We have a situation of 
degenerate perturbation theory, in which the eigenvalue 
2K\2 has a negligible correction from K23, while the re- 
maining \V\ — 1 orthogonal vectors have zero eigenvalue 
when K23 is neglected. When K23 is included, we de- 
rive a reduced Hamiltonian by projecting the K23 terms 
to the subspace of zero eigenvalues of the previous step. 
This contains only one 2x2 nonzero block, and it turns 
out that this produces a nonzero eigenvalue 3 if 23/2, with 
normalized eigenvector (1, 1, —2, 0, . . -Y/VQ in the orig- 
inal basis, as well as a zero mode (1, 1, 1, 0, . . . , )'/V3. 
Hence the subspace of remaining zero modes has a ba- 
sis that consists of the latter vector which involves three 
vertices that have been connected by the couplings K12 
and K23, and \V\ — 2 vectors, each for a single vertex 
that has not yet been connected. These form the degen- 
erate subspace within which the next largest Kij must 
be considered. Similarly, in the first case, the zero-mode 
subspace has a basis that consists of two eigenvectors 
that involve two vertices each, and — 4 that involve 
one each. 

This procedure can be easily iterated. After each step, 
the space of remaining zero modes possesses a natural 
basis with one basis vector for each of a number of clus- 
ters of vertices, which have been connected by the cou- 
plings that were considered at earlier stages. For 
each cluster, of say n vertices, the zero-mode eigenvec- 
tor is a non-zero constant on those vertices, and zero 
elsewhere. The next strongest that has not already 
been considered (or "tested") must be projected into this 
zero-mode subspace. One additional possibility occurs in 
general, as the are considered in decreasing order. 
Sometimes the next strongest connects two vertices 
that already in the same cluster. In this case, the result- 
ing lxl block produces an eigenvalue and no change 
in the eigenvector. Thus these couplings may be ignored. 
The interesting inductive step thus involves a = K 
that couples two zero-mode clusters containing, say, n 
and m vertices respectively. The projected matrix in the 
subspace spanned by these two normalized eigenvectors 
takes the form 

I —K/y/nm K/m J ' 

and has eigenvalues (n + m)K/(nm), with eigenvec- 
tor (v^m, —yfnf I \Jn + m, and zero, with eigenvector 
(\/n 7 y/rriy j\Jn + m. In the original basis, the zero-mode 
eigenvector is again of the form of a constant on the con- 
nected cluster of n + m vertices and zero elsewhere, which 
allows the induction to proceed. This procedure can be 
followed until \V\ — 1 non-zero eigenvalues have been 
found, and there is the one remaining zero mode of A 
itself, which in the orig inal basis is (1, 1, . . . , 1)*/|^| 1/2 - 



We see that this procedure takes the in sequence, 
beginning with the largest (corresponding to the small- 
est £ij), and discarding those that connect vertices that 
have already been connected. Hence at each step, the 
clusters of vertices formed by the zero modes each take 
the form of a tree, connected by the stronger couplings 
that correspond to non-zero eigenvalues, but which 
do not form a cycle. The clusters form a spanning forest 
of trees (some trees may contain only a single vertex and 
no edges), until the last step at which a single spanning 
tree is formed. This procedure of constructing a tree 
by adding the lowest-cost edges unless they form a cy- 
cle is exactly Kruskal's greedy algorithm for finding the 
MST [7]. To see that it solves the MST problem, we 
may construct the partition function. The determinant 
det' A is essentially the product of the non-zero eigen- 
values of A. We have shown that this product is ap- 
proximately \V\e~ ^ «>e T li il T ^ where T is the spanning 
tree obtained by the above procedure. The removal of 
one row and column before calculating the determinant 
removes the factor |V|. Our approach has constructed 
the leading term in the partition function as T — > 0, and 
gives a proof that the greedy algorithm is correct (there 
are of course other ways to show that [1] , without linear 
algebra, but the present approach will be useful to us). 

C. Connection with percolation 

It is of interest to study the structure of the eigen- 
vectors of A, especially in a large portion A of the d- 
dimensional cubic lattice (A will be assumed to be a con- 
nected domain with a smooth boundary, such as a cube) . 
First we establish a connection with percolation. Sup- 
pose that the set of costs 1^ is given. Then at a step 
where all edges of cost 1^ < I have been tested, the clus- 
ters formed by the zero modes can be thought of as (a 
sample of) bond percolation clusters (even when a prob- 
ability distribution on the £ij has not been specified). 
Moreover, if we are only interested in which vertices are 
connected in the clusters that represent the zero modes 
at a particular step, then it makes no difference to in- 
clude the edges that were tested earlier but discarded as 
they formed a cycle. Now we will suppose that the £ij 
are random variables, but not necessarily that the costs 
for distinct edges are statistically independent (note that 
this includes the Euclidean model, as well as general lat- 
tice models). If all edges with cost lij < I are "occu- 
pied", then we have a general form of bond percolation, 
with correlated bond-occupation probabilities. We will 
always assume that the correlations in the lij are short- 
ranged (falling, say, exponentially with distance), and 
translationally-invariant, and that the cumulative prob- 
ability for any single £jj is continuous. In percolation, 
there is a percolation threshold at t — £ c , such that in 
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the limit A — > Z d , for £ < £ c any connected cluster is 
finite (with probability one), while for £ > £ c there is 
a single infinite cluster, as well as many finite ones (ex- 
cept when £ reaches the supremum of the support of the 
probability density of £ij ) . In the simplest model that we 
use, which contains the generic (or universal) behavior of 
short-range correlated percolation, the costs £ij are sta- 
tistically independent, and each is distributed uniformly 
in [0,1]. The corresponding percolation model is then 
that in which the bonds (edges of A) are occupied (inde- 
pendently) with probability p = £, and unoccupied with 
probability 1 —p. The percolation threshold in this model 
will be denoted p c . In this model, in one dimension, 
p c = 1, and in two dimensions p c = 1/2 on the square 
lattice, by duality arguments. In the Euclidean model of 
MSTs, each £jj is the Euclidean distance between i and 
j, where the |V| points are (in the simplest Euclidean 
model) independently and uniformly distributed over the 
domain A (with density 1). In this model, the corre- 
sponding percolation problem becomes (the Voronoi, or 
"lily pad" , form of) continuum percolation. 

In the simplest, independent-edge, model of bond per- 
colation, the finite clusters above and below p c have typ- 
ical size £pcrc which diverges at p — > p c as £pcrc ijp) ^ 
\p — p c \~ UpoTC , where ^p C rc is a universal d-dependent ex- 



ponent. As p — > p c , these typical clusters are fractals 
with Hausdorff dimension Dp erc . For d > 6, f porc 
and D 



pcrc 



4; the clusters behave as branched polymers 
(trees) with no, or negligibly many, cycles (even though 
cycles are not forbidden in percolation) . These properties 
are also believed to hold, with the same exponents, for 
the more general models with short-range correlations of 
the £ij, with £ (£ c ) in place of p (p c , respectively), pro- 
vided that the probability density for each single £ij is 
smooth at £ c . £ c is non- universal, that is it depends on 
the details of the probability distribution. In the follow- 
ing, results will be given in terms of the simplest model, 
but hold equally for the other models. 

The relation we have described of the growing trees 
in Kruskal's algorithm to percolation is similar to that 
[4,26-29] between Prim's algorithm [30,1] (which for a 
given finite sample ultimately produces the same MST) 
and invasion percolation [31]. Invasion and ordinary per- 
colation (at the percolation threshold) are believed to be 
in the same universality class. 

The eigenvectors with non-zero eigenvalues are always 
a combination of two clusters from the preceding step in 
the algorithm, that are connected by the next-strongest 
coupling Kij , with amplitudes that are constant on each 
of the two clusters. More precisely, the amplitudes are 



1 



y/n + m V n 



(21) 



for each vertex on the cluster of n vertices, and minus 
the same but with n and to interchanged on the cluster 



of to vertices. Hence, for < p c , where both clusters 
typically have size of order £ per c (evaluated at p — £ij), 
the eigenvector is localized on a length scale also of order 
Cporc- For £ij > p c , there is an infinite cluster, i.e. one 
that occupies a finite fraction of the vertices as A — > Z d . 
In this case, by letting n — > oo, we find that the nor- 
malized eigenfunction is concentrated on the finite clus- 
ter of to vertices, and so is also localized, with localiza- 
tion length diverging as £ per c as j) — ► p c . Thus, with 
the exception of the zero mode, in the strong disorder 
limit all eigenvectors of A are localized, except at p —* p c 
where the localization length diverges. The mean local- 
ization length presumably increases monotonically as the 
£ij corresponding to the eigenvalue increase to p c , then 
for £ij > p c decreases monotonically as £ij — > 1 . 



D. Effective resistance in the strong disorder limit 

We now apply the preceding results to the effective re- 
sistance between any two vertices, R( e quiv)ij> using eqns. 
(19), (18), in the strong disorder (T -> 0) limit. 

Each eigenvector has the structure described in the 
previous Section, with constant amplitude on two clus- 
ters of sizes n, m connected by the next strongest cou- 
pling, K say, and zero elsewhere, and can only contribute 
1/2 to J?( equ iv)ij if at least one of i, j lies on one of the clus- 



ters. Suppose there is nonzero amplitude at both i and 
j. If both are in the same cluster, then the contributions 
to i?(equiv)ij cancel. If they are on opposite clusters, the 
contribution to J?( equ iv)ij is 



to + 



1 /to n \ 
— - + - + 2 

+ n \ n to / 



(n + m)K 



(22) 



which simplifies to l/K. Finally, if one of i, j, say i, is 
on a cluster (say, the one of n vertices) but the other j 
is not, then the contribution is to 2 /[(n + m) 2 K]. 

In the procedure that generates the eigenvectors, the 
sizes of the clusters are monotonically increasing. For 
given i and j, the situation that one of i, j is on one clus- 
ter, the other on the other occurs only once, at the stage 
where those two clusters get connected, so there is only a 
single contribution of the form l/K . The situation where 
only one of i and j is in a cluster occurs at larger values of 
the couplings than this K. For smaller couplings than K , 
both vertices are both in the same cluster, or neither is 
on a cluster. Then as T — > 0, this single term l/K dom- 
inates the equivalent resistance. This is consistent with 
the picture that in the strong disorder limit, the current 
from i to j is carried along a single non-self-intersecting 
path of edges, such that the sum of resistances along the 
path is minimized. However, the total resistance of a 
path is dominated by the largest resistance on the path, 
and this is exactly the resistance l/K. 

We see that the current must pass through the edge of 
resistance l/K that we have singled out, in a particular 



G 



direction that is also determined (this could be verified 
also by calculating the current on any edge, using formu- 
las from the previous section). Then the current injected 
at i must pass along the edges to the correct end of this 
edge. In the strong disorder limit, we may use the above 
arguments again to find the resistance between these ver- 
tices, which is again dominated by a single resistor of re- 
sistance < K^ 1 . This construction can be repeated until 
the complete path of lowest resistance from i to j has 
been found. Each resistor on the path is one of those 
that corresponds to a non-zero eigenvalue of A, and so 
lies on the MST. It follows that in the strong disorder 
limit, the path of least resistance between any two ver- 
tices lies along the MST. In other words, the MST is the 
solution to the following problem (the all-pairs minimax 
path problem) [32]: given a "resistance" on each edge of a 
connected graph G, for each pair of vertices i, j, find the 
path from i to j that minimizes the value of the largest 
resistance on the path, and take the union of these paths 
over all pairs of vertices i, j. 



E. Effect of small nonzero temperature 

Now we turn to the behavior at a small non-zero tem- 
perature T, which means a finite strength of disorder; 
here, we present arguments using only percolation the- 
ory, leaving the behavior of the eigenvalue problem for a 
later section. 

RRNs in d dimensions with resistances of the form 
Rij = e li ^ T for (ij) an edge connecting nearest neigh- 
bors, Rij = oo otherwise, have been considered in several 
earlier works [22-25]. As the distribution of resistances 
is very broad for T small, the following picture of the 
network emerges. If we consider the clusters that are 
connected by resistors with 1^ < £, then for i < £ c these 
do not percolate. They consist of low resistances, which 
can be considered to be essentially zero (like supercon- 
ducting links). Resistors with £ C — T< 1^ < £ C + T (the 
exact coefficient of T in these bounds is not precisely de- 
fined, but is order 1, and is set to 1 for illustration) are 
all of a similar magnitude, and connect the superconduct- 
ing clusters into a network that spans a positive fraction 
of the system. Finally, the resistors with 1^ > p c + T 
connect other clusters to this network, but these clusters 
are shorted out by the lower resistors and do not con- 
tribute to conduction on large scales. On large scales, 
the resistance or conductance of the system is that of 
an effectively uniform medium described by a conductiv- 
ity a (note that the conductance [the reciprocal of the 
resistance] of a cube of size L is E = <jL d ~ 2 ), with 

' d>6. (23) 



a oc e 



c/T 



tance for the clusters connected by these edges. Then 
for dimensional reasons, the density of critical edges that 
connects clusters contributes a factor of length to the d— 2 
power, and this length must be the size of the clusters 
used, which is £ pC rc(p = 4 — T) oc T _ly p° rc . For d > 6, 
there is an additional power £p~^ which is the number 
of distinct connected percolation clusters in a window of 
size £ P erc at criticality [21] (this number is of order one 
for d < 6 — this is the breakdown of hyperscaling rela- 
tions for d > 6, expressed in terms of the geometry of the 
clusters [21,33]); these distinct conducting channels add 
since they are in parallel. The possibility of an additional 
power of T (as would occur in some different models of 
RRNs [21]) was investigated, and bounds on its exponent 
were found [23]. Le Doussal [24] argued that the power 
of T in a is exactly as given in eq. (23). It should be 
noted that in these earlier works the length scale above 
which the effective medium, with negligible fluctuations 
in conductivity, applies is £ pcr c oc T~ Uporc . This length 
scale has also been identified in a recent work that exam- 
ined finite-size scaling properties of the RRN [25]. This 
length scale is an important result for weighted spanning 
trees (i.e. MSTs at positive T) as well: 



(24) 



with v = v , 



pcrc- 



This arises as follows: there is a conductance of around 
e ~ e c/ T for each "critical" edge [22], and negligible resis- 



F. Cost and entropy at positive temperature 

In this section, we address the positive-temperature 
properties of weighted spanning trees directly, that is in 
terms of trees, not resistor networks. 

The most elementary excitation of a spanning tree is to 
move an edge. By this we mean that an edge on the tree 
is removed, thus cutting the tree into two parts, which are 
then reconnected by adding a different edge (not on the 
initial tree). The change in cost is simply the difference 
of the costs of the two edges involved. All spanning trees 
can be reached from the MST by successive operations 
of this type [1]. Starting from any spanning tree, then 
because our models assume a continuous distribution of 
costs for the edges, with probability one either it is the 
unique MST, or it is possible to move one edge such that 
the total cost decreases [34]. Hence there arc no true 
"metastable states" (i.e. local, but not global, minima 
with respect to moving a single edge) in the MST prob- 
lem, at least not on a finite graph as assumed in these 
arguments. 

At low temperatures T, there will be thermal exci- 
tation of single-edge moves, which can occur indepen- 
dently. Consider the following situation. In the greedy 
algorithm, suppose that edge (ij) is added to the MST 
when lij = I. Suppose further that, before this edge is 
added, the trees (clusters) already grown are such that 
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(ij) and one other distinct edge (kl) would form a cy- 
cle if both were added. Then adding (ij) to the tree 
prevents the subsequent addition of (kl). But if (kl) is 
added instead of (ij), then this connects the same two 
clusters, and hence does not affect which edges can be 
added at subsequent stages of the greedy algorithm, that 
is at I > £ij. As such pairs of edges will be found at all 
stages of the greedy algorithm, there will be many such 
pairs of edges in the MST, each of which may be excited 
(one edge replaced by its partner) independently of the 
states of the other pairs. 

If we consider only these pairs, then a simple picture 
of "two-level systems" (TLSs) [35] emerges, that should 
be useful at low T: other than the MST, the spanning 
trees that contribute to the partition function differ from 
the MST only by having one or more of the edge-pairs 
excited, and these can be excited independently The 
partition function within this picture can be calculated 
easily, if the excitation costs £ki — £ij are given. One 
needs some information about the probability distribu- 
tion of these excitation costs. Let us consider only values 
of £ij (as before, this is the lower of the costs for each 
pair) that are bounded away from the critical value i c . 
Then the sizes of the clusters connected by £ij at that 
stage of the greedy algorithm are of order £, pe rc(£ij) [for 
cases where £ij > £ c , we mean the size of the finite clus- 
ter (s) involved], which is bounded. When T is small, the 
pairs in which we are interested have £ki — £ij of order 
T or less, and occur with density tending to zero with 
T; hence the mean spacing between them is much larger 
than their size in this limit. It is reasonable to imagine 
that their excitation costs are statistically independent, 
and that the probability density for the excitation cost 
of each approaches a constant as £ki — £ij — > (the con- 
stant might depend on £ij, but this is not important). 
For example, one can estimate this probability density, 
and check statistical independence of distinct TLSs, for 
the case when the cycle involved is an elementary square 
of side 1. We introduce the standard (canonical ensem- 
ble) statistical mechanics definitions of the free energy 
F = —T\nZ, entropy S = —dF/dT, and internal en- 
ergy (or cost) E = F + TS = T 2 d\nZ/dT. The en- 
tropy can be thought of as the logarithm of the number 
of trees with cost less than the corresponding value of E 
(this microcanonical-ensemble definition will agree with 
the canonical definition in the limit \V\ — > oo with S and 
E oc |V|, and with T fixed, as used here). It follows from 
the TLS model that at temperature T, the entropy per 
vertex, s — lim^y^^ S/\V\, behaves as 

s oc (25) 

as T — > 0, while the thermal average excitation cost per 
vertex (e) = lim|y|_ >00 (£' — <?opt)/|^| behaves as 

(s) oc T^ +1 (26) 



in the same limit, where ip = 1. Since we have included 
only a subset of the possible excitations, these statements 
should be taken as a lower bound on s, so that ip < 1. 
This notion of TLS is generic for many disordered sys- 
tems [35], and the behavior s oc T is typical for these 
applications. (A similar picture of TLSs for MSTs was 
also used in Ref. [12] to obtain the behavior of the cost of 
the minimum spanning tree that differs from the global 
MST by a given fraction of edges.) Note that in these 
statements wc did not need to explicitly perform the dis- 
order average, as the thermodynamic |V| — > oo limit of 
these quantities self-averages. 

In this argument, we used only TLSs that demonstra- 
bly were completely independent as excitations. There 
could of course be other low-energy TLSs, possibly in- 
volving moving more than one edge, that can only be 
excited conditionally on the states of other edges. But 
in general, by a TLS we will mean a compact (localized) 
excitation. We note that the above arguments do not 
apply in the one-dimensional case, which however can 
be solved directly. For a system of L vertices with a 
periodic boundary condition, the entropy and mean ex- 
citation cost are of order In LT and T (not oc L d , unlike 
the d > 1 cases), respectively, as L — > oo with T fixed; 
they can be calculated exactly for the simplest model of 
independent edges each distributed uniformly in [0, 1]. 

So far we were careful to move edges that were not 
close in cost to the percolation threshold. Now we exam- 
ine these in detail, using the simplest model for which the 
corresponding percolation problem is uncorrelated bond 
percolation model. The idea is similar to that used in the 
RRN point of view in the previous section. If we run the 
greedy algorithm until all edges with cost less than p c mi- 
nus of order T have been tested, then we obtain a set of 
clusters of size less than about £ pe rc(p = Pc~T). If we add 
all edges of cost between this limit (p c — T) and p c plus 
of order T, then we obtain a giant cluster that contains a 
nonzero fraction of the vertices as |V| — ► oo. We are in- 
terested in the subset of these edges that connect distinct 
components (which can be viewed either as clusters or as 
trees) of the spanning forest for £ = p c — T; we call these 
critical edges. Clearly not all of these critical edges can be 
on the MST. But for the positive-temperature weighted 
spanning tree problem, the many different ways of adding 
a subset of the critical edges so as to obtain only trees 
have similar Boltzmann-Gibbs weight. We can construct 
a reduced graph that has the critical edges as its edge 
set, and the connected components for £ = p c — T as its 
vertices. We will assume that the reduced graph is con- 
nected. Then if we sum over all spanning trees of this 
reduced graph with the corresponding Boltzmann-Gibbs 
weights, then as the differences in cost are only of order 
T when any one edge is moved, this problem is approxi- 
mately a uniform spanning tree problem. It is essentially 
counting all the spanning trees. As in the TLS argument, 
the choice of a spanning tree on the reduced graph does 
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not affect the remaining edges to be added of still higher 
cost, which complete a spanning tree of G, because the 
spanning trees of the reduced graph all connect the same 
vertices of G. 

The connectivity properties, such as the probability 
that k distinct branches of the tree cross between two 
chosen balls (as discussed in the introduction), and cor- 
responding scaling dimensions and Hausdorff dimensions 
arc unaffected by TLSs of size smaller than the scale on 
which these correlations are studied. But on scales larger 
than £perc(Pc ± T), the argument here, which is essen- 
tially a coarse-graining or rcnormalization group argu- 
ment, suggests that the connectivity properties become 
those of uniform spanning trees (USTs). In the UST 
problem, which corresponds to the T — > oo limit of the 
weighted spanning trees, disorder (randomness) in the 
costs £ij can be shown to be irrelevant, that is it has 
no effect on the large-scale universal properties. As we 
have seen that temperature is a relevant perturbation of 
the zero-temperature (MST) limit, it makes sense that 
the crossover is to USTs at large length scales. This is 
consistent with the arguments of the previous section, in 
which the conductivity at large scales becomes essentially 
non-random, because we can identify the non-random 
T > (uniform) spanning tree problem with a resis- 
tor network with a constant resistor on each edge of the 
lattice. We see again that the crossover length scale di- 
verges as £ oc T~ v as T — > 0, with v = ^ p0 rc- As noted in 
the Introduction, by using the above results for the cost 
and entropy, this can be interpreted as saying that for a 
typical spanning tree that has cost within 1 + e of £opt, 
the length scale is £ oc e-'VW'+i) at |y| ^ f or £ _> q. 

We should emphasize that saying that temperature is a 
relevant perturbation of the zero-temperature MST fixed 
point does not, in our view, entirely rule out the possible 
equivalence of the universality classes of statistical con- 
nectivity properties in the MST at T = and USTs. 
That is because the averages are different in the two 
cases. For the MST, we mean the average of a quantity 
over the random costs with respect to which the opti- 
mum must be found. For the UST, there is a nonzero 
(or even infinite) temperature. Theoretically, it still ap- 
pears possible that the universality classes for geometric 
or connectivity properties are the same, in sufficiently 
low dimensions (indeed, in d = 1 the resulting probabil- 
ity distributions on trees are the same, though the con- 
nectivity properties are trivial). For d = 2, this would 
imply conformal invariance of the MST. However, the 
universality classes for d — 2 have been compared numer- 
ically by looking at certain exponents [29,36], and while 
the early results may not have ruled out their equality, 
recent numerical evidence [37] seems also to be against 
these universality classes being the same, and against the 
conformal invariance of the d = 2 MST. 

The reduced-graph (or coarse-graining) idea can be 
used to estimate the contribution to the entropy of the 



network of critical edges. On large length scales, the 
reduced graph behaves as a finite-dimensional system. 
Hence, the entropy of the uniform spanning trees formed 
using the critical edges only should be of order the num- 
ber of vertices of the reduced graph. For d < 6, there is 
of order one connected percolation cluster per correlation 
volume £ P erc(Pc — T) d , and hence the contribution to the 
entropy per vertex is £ pe rc(Pc ± T)~ d oc T dv for d > 1. 
For d > 6, there are of order £p~^ connected clusters per 
correlation volume [21]. Hence we expect that the contri- 
bution to the entropy per vertex is £ pe rc(Pc ± T)~ 6 oc T 3 
for d > 6. In either case, the result is smaller than T as 
T —> when d > 1. For 2 < d < 6, we predict then that 
the entropy per vertex has the form 

s ~ aT + ai T 2 + a 2 T du + ... (27) 

as T — > 0, where a, a\, ai are non-universal coefficients. 
This form can be viewed as an "analytic part" , in integer 
powers of T, which we have continued to order T 2 (be- 
cause dv > 2 for d > 2), plus a non-analytic or singular 
part T dv . (Such a form is familiar from ordinary critical 
phenomena at nonzero temperature.) The free energy 
per vertex, / = liniiyi^oo i^/|V| divided by temperature 
has a similar expansion, as does the internal energy per 
vertex over temperature, only the coefficients a, a, being 
changed in obvious ways in each case. Thus, the ear- 
lier arguments that the leading term in s in fact has 
ip = 1 is an argument that the leading effects are local- 
ized excitations that contribute to the analytic part. A 
power ip < 1 would be viewed as a non-analytic part, and 
would presumably indicate that the leading contribution 
is from large-scale collective excitations. The singular 
part T dv for d < d c is of the form expected when hyper- 
scaling applies in critical phenomena, except that here 
it applies to F/T instead of to F. That is because we 
dealing with a fixed point (or critical point) at zero tem- 
perature, and the natural quantity that scales is F/T, 
which controls the probabilities of different configura- 
tions (whereas at a transition at nonzero T = T c , one 
can expand F/T c in powers of T — T c ). Hence we expect 
on general grounds that these expansions are of the cor- 
rect form. For d > d c = 6, the singular part takes the 
form T 3 which apparently we cannot distinguish unam- 
biguously from the analytic part. This difference from 
ordinary critical phenomena occurs because only T > 
is available. 

G. Implications for the eigenvalue problem at T 

In this section, we apply the results obtained in pre- 
vious sections from RRNs and from percolation to the 
eigenvalue problem for the matrix A, in the regime of 
strong but finite disorder, T non-zero and small. The re- 
sults of this section are not used elsewhere in this paper. 



9 



As we saw, when T — > in a large system, de-localized 
eigenvectors (other than the zero mode) occur only at the 
percolation threshold p c . On the other hand, when T is 
non-zero there is a well-defined probability density for the 
KijS. One then expects de-localized (in fact, extended) 
eigenvectors to occur at sufficiently low eigenvalues if d > 
2, while for d = 2, the localization length diverges as the 
eigenvalue A — > [18]. We also expect that for d > 2, 
in the strong disorder limit as T — > 0, the fraction of 
extended eigenvectors tends to zero. One would like to 
understand how these two descriptions of the spectrum 
are connected in the limit. We will present a partial 
answer to this question. 

When T is small and non-zero, the method used for 
T — ► breaks down when the assumption that Kijs for 
distinct edges are very different breaks down. A typical 
way for this is to happen is provided by the configurations 
that gave the TLS in the previous section. When £ij 
and £ki connect the same two clusters (zero modes of 
couplings stronger than cither of these), and are within T 
of each other, then both must be included in the reduced 
2x2 block, and the eigenvectors and non-zero eigenvalue 
they produce are modified, though the eigenvector is still 
localized. This does not affect later eigenvectors, and in 
the partition function produces the thermal effects we 
have described using the TLS picture. 

It is very plausible that the extended eigenvectors for 
small T are produced by the critical edges only, that is 
those with lij within T of £ c that connect clusters of 
size of order £. This is connected with the crossover to 
the UST behavior at large length scales, and to the ef- 
fectively uniform conducting medium in the RRN point 
of view. Hence, one expects that using these clusters, 
on length scales larger than £, the Laplacian A can be 
represented by A c ff = — crV 2 . Then the density of eigen- 
values A (per unit volume and per unit A) is predicted to 
be oc (j- d / 2 \( d - 2 )/ 2 as A — > 0. This appears to be consis- 
tent with other approaches for the d = 1 case, which is 
essentially soluble [17,19] (and the value of a can also be 
easily verified for this case [24]). 

Next, there is the question of the behavior of the mo- 
bility edge (the value of A above which, in a large system, 
eigenvectors are localized), or alternatively the fraction 
of eigenvectors that are extended. We will not enter into 
a full study of the spectrum here, but only make a crude 
estimate, which may capture the correct asymptotic be- 
havior. Using A c ff, we expect that the number of states 
(per unit volume) with eigenvalue less than A scales as 
oc (j- d / 2 \ d / 2 as A — > 0. A of f is valid only for scales > £, 
so this can hold only until the number of states it predicts 
reaches £~ d . This gives a "critical" value for A, 



A r oc e ic/T x 



T 3 , 



d < 6, 
d > 6, 



(28) 



is larger for d > 6 because the number of clusters of size 
£ that can be used to construct the extended states is 
of order per unit volume, not £~ d . The exponential 
dependence, e -£c / T , agrees with the fact that in the T = 
limit, derealization occurs only at £ — £ c , so only the 
sub-exponential dependence on T can be in question. 

The fate at T ^ of the eigenvalues of the T —* limit 
at £ij > £ c is a puzzle. They should remain localized, 
but their density of states appears to overlap that of the 
extended eigenvectors. We cannot resolve this here, and 
so our description of the spectrum for d > 2 and for small 
T^O must remain somewhat tentative. 



III. FINITE-SIZE AND BOUNDARY-CONDITION 
EFFECTS ON THE TOTAL COST 

In this section we consider the effect of finite system 
size on the optimum cost of the spanning tree, and of 
changing the boundary conditions (imposing additional 
constraints) on this minimum cost. The arguments are 
largely independent of those in the last section, except 
that the relation to percolation again appears. The re- 
sults take the form of a term in the subleading (in inverse 
powers of system size, L say) behavior of the cost that 
features an exponent 8, which is again related to percola- 
tion, 6 = — 1/Vpcrc- Finally, we obtain a scaling form for 
the free energy, which exhibits the crossover between the 
zero temperature cost and the infinite-size limit at fixed 
positive temperature, which is related to the results of 
the previous section. 



A. Finite-size scaling of the mean cost 

The relation of MSTs to percolation was explained in 
Section II C. In the most general case, when all edges 
of cost less than £ are occupied, we have a subgraph 
of G which consists of one or more connected compo- 
nents, called clusters (there may be clusters consisting 
of a single vertex and no edges). This number will be 
denoted Af(£\G), and depends implicitly on the set of 
edge costs £ij. For £ < min/^\ £ij, Af(£\G) = \V\, and 
for £ > max( i3 ) £ij, J\f(£\G) = 1. Between these limits, 
N(£\G) obviously has a sequence of downward steps of 
unit magnitude. The MST for the same graph G with 
the same set of costs consists of those edges which, as 
£ is increased from its lower to its upper limit, decrease 
the number of connected clusters by 1. Then we have 
the general formula for the optimum cost (without aver- 
aging): 



£opt 



as T — > 0. This value is our prediction for the mobility 
edge for d > 2, though it is possible that the correct value 



J — ( 



d£ 



(29) 



It follows that the mean cost of the MST is exactly 
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^OPT = - 



di 



where Af(£\G) is the mean number of connected clusters 
in the corresponding percolation problem. (This idea 
is certainly known to probabilists, and is contained in 
Frieze's [38] exact calculation of £opt as |V| — > oo for 
the case of the complete graph [i.e. one edge (ij) for ev- 
ery pair i, j of vertices] with independent costs for the 
edges.) For the simplest model, in which the costs are 
independent and uniformly distributed in [0,1], this re- 
duces to 



^OPT 



Jo 



dpp 



d~N{p\G) 
dp 



(31) 



which we use hereafter. For the complete graph, the re- 
sult as |y| — > oo is [38] ^opt = C(3), where Q is the 
Ricmann zeta function. In this paper, we specialize to 
graphs G that are a portion A of a cubic lattice in d di- 
mensions, and we will further assume here that A is a 
cube of side L (parallel to the lattice axes) , with periodic 
boundary conditions. For this system, we write the mean 
number of percolation clusters as 7V(p, L). Again, the re- 
sults found below also apply to the more general models 
as delimited in the previous section. The following argu- 
ments could be extended further to study the boundary 
terms in eq. (2), or further finite-size corrections. 

The function 7V(p, L) / L d should have a well-defined 
monotonically-decreasing limit: 



Y{p)= lim W(p,L)/L d , 

L^oo 

where the limit is taken with p fixed. Thus 

dY(p) 



-I 



dpp- „ 
o op 



(32) 



(33) 



The expected fraction of edges of cost between p and 
p + dp that lie on the MST as L — > oo is 



1 dYjp) 
d dp 



dp 



(34) 



for the (hyper-)cubic lattice; this function has been cal- 
culated and plotted in Ref. [29] for some lattices in di- 
mensions d — 2 and 3 (though without making this con- 
nection with percolation, and the singular contributions 
we discuss below are not visible). There is a simple but 
important relation involving Y, which originates from the 
facts 7V(1, L) = 1, Y(l) = 0. It can be written as: 



Jo 



dp 



dJ?{p,L) 
dp 



i dY(p) \ 

dp ) 



i, 



(35) 



and will be used below. 

We may now substitute these forms to obtain a result 
for ^qpt: 



(30) W = 0L' 



- / dpp 
Jo 



fJL d -p c 



J^dp(p- p c ) ^ 



d7J(p,L) 
dp 

d/V(p,L) 



dp 



Td dY(p)\ 
L ~dp-) 



using eq. (35). Notice that in more general models, the 
term — p c is replaced by the value — £ c of the cost at 
the percolation threshold, as claimed in the introduction. 
Next, we present arguments that the remaining integral 
goes to zero as L — > oo, and find its magnitude. 

In percolation, 7V(p\G) plays the role of the free energy 
of a statistical mechanics problem [21] (this can be made 
precise by using the relation of percolation to the Q — > 1 
limit of the Q-state Potts model on the arbitrary graph 
G j). In the case of a lattice in dimension d, Y(p) has 
a singular (nonanalytic) behavior at p — p c (p c is the 
percolation threshold of the infinite system), which for 
d > 2 has the form [5] 



Y( P ) ~ Y( Pc ) + (p- Pc )Y'( Pc ) + l(p - Pc fY"( Pc ) 



+ C±\p-p, 



\2-a 



+ ... 



(37) 



as p — > p c . Here a is another universal exponent, C_, 
C + are non-universal (^-dependent constants for the cases 
p < Pc p > p c , respectively, and the leading terms on 
the right hand side vanish more slowly than \p — p c \ 2 ~ a ■ 
For d < d c = 6, 2 — a = dv pcrc (and apparently varies 
monotonically), while 2 — a = 3 for d > 6. As2< 
2 — a < 3 when d > 2, the non- analytic part of Y does 
not necessarily contradict the monotonic decrease of Y (p) 
with increasing p. We will define 



Y( P )s 



C± \p - Pc 



(38) 



for all p, so as to match the non-analytic behavior; 
^(p)sing will be used only in the vicinity of p c . For d = 1, 
p c = 1, Vpcrc = 1, Y(p) = 1 — p, and the singular piece 
cannot be separated from the background, though Y(p) 
does obey the expected linear form as p — > p c (Y must 
be positive, so cannot be smoothly continued to p > 1; 
this can perhaps be viewed as a non-analyticity). 

The idea for completing an estimate of the final in- 
tegral in eq. (36) is that the difference of derivatives in 
the integrand, which must obviously be smaller than L d 
as L — > oo, is in fact much smaller, and concentrated at 
p = p c - At finite L, L d dY/dp, which has a nonanalyticity 
at pc, is replaced by dN(p, L)/dp, which is analytic in p 
for all p (in fact, it is a polynomial in p) . The derivative of 
the number of clusters is sensitive to the finite size of the 
system only through correlation effects. Consequently, 
sufficiently far from p c that L » £ porc oc \p — p c | _!/porc as 
p — > pc, the difference between the two functions is of or- 
der e~ c L /4p" c . Hence, the final integral converges, and 
one would expect it to be bounded by A'L~ 1/Vpcrc [this 
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would follow immediately, by using the identity (35) once 
again, if we had more information about the sign of the 
integrand in this identity] . The following arguments pro- 
vide a detailed support for this idea, and indicate that 
this conservative bound is likely to be the precise order 
of this correction term in most cases. 

We will use the notion of finite-size scaling [39] , which 
generalizes the scaling statements to finite size L. This 
follows the form for conventional equilibrium phase tran- 
sitions (see especially Ref. [40,41]), which percolation 
closely resembles (some rigorous results can be found in 
Ref. [33,42]). We will briefly review the form of these 
arguments, so as to include the cases d > d c . While 
M s ing ip, L) is analytic in p for finite L, we wish to iden- 
tify a part (traditionally termed "singular" ) that in the 
vicinity of p = p c tends to L d Y s - mg (p) as L — > oo. This 
may be defined by subtracting the nonsingular part of 
Y{p): 



Af sing (p, L) = Jf{p, L) - L d {Y(p) 



^sing(p)), 



(39) 



which again will be used only in the region p ~ p c . Then 
according to the theory of finite-size scaling for equilib- 
rium phase transitions, as L — > oo, 7V s ing(p, L) obeys the 
scaling form 



Jf shlg {p,L) = n{tL y \uL y -), 



(40) 



where t = p — p Cl u is an additional parameter (a cou- 
pling constant) that in a field theoretic calculation [43] is 
treated as independent, and y t and y u are universal scal- 
ing dimensions (which depend on d). The scaling form 
is supposed to hold for some finite function n as L — > oo 
with the arguments tL Vt , uL Vu held fixed, and thus does 
apply only for p close to p c . The correlation length, in 
an infinite system, scales as £ pCT c oc \p — p c \~ Upetc , where 
z^pcrc = 1/yt- For d < d c , u rcnormalizes to a fixed point 
value and can be dropped (unless it is desired to find cor- 
rections to scaling). For d > d c , u renormalizes towards 
zero (y u oc d c — d < 0), but cannot be dropped as the 
free energy n depends on it in a singular fashion: 



n{x,z) = z Pl n*(xz P2 ) 



(41) 



as z — > 0. The authors of Ref. [40] showed that p\ = 0, 
and this should also hold for percolation. Then the mean 
number of clusters takes the form 



Afsin g {p,L) = | 



n{tL,v*) tovd<d c 



(42) 



in which u P2 has been absorbed into the non-universal 
scale factors that accompany t. Here j/ t * = yt + Piy u , 
and for percolation the field-theoretic formulation leads 
to y u = {6- d)/2, p 2 = -2/3, y* t = d/3 = y t d/d c [44] 
for d > 6. The implication of these scaling statements is 
that the analytic background that has been subtracted 
has negligible (exponentially small) L dependence, even 



at p c . The finite-size scaling form given should be of order 
L d as L — > oo with t fixed, and must match L d Y(p) s i ng , 
so we must have 



n(tL*) ~C±L d \t 
n*{tL y ') ~C±L*\t\ d M oc 



' oc L d t~ d IC for d < d c , 



L%" 6 rc ford>d c , 



(43) 



for \t\L yt (resp., \t\L Vt ) large, for both signs of t. These 
scaling behaviors are consistent with the above forms for 
a, and L d Y s i ng (p) itself satisfies the same scaling behavior 
as 77 sing (p,L), L d Y sing (p) = Y(tL yt ) (Y(tL y * ) for d > 
d c ). For d = d c there may be logarithmic corrections to 
these scaling forms, which we will neglect. 

Now the integral in eq. (36) contains 
only dJT sing (p,L)/dp- L d dY sing (p) /dp, and for d < d c 
only can be rewritten using the scaling behavior in terms 
of x = tL yt (we turn to the d > d c cases below): 



-yt 



dx ; 



dn(x) dY(x) 



dx 



dx 



(44) 



The difference of derivatives is expected to behave as 
e -c \x\ 1/vt £ Qr somc ^-dependent constant c" at large \x\, 
because the leading error is due to correlations that prop- 
agate around the system, and will involve the linear size 
L/^perc- R follows that the integral converges, and we 
have obtained 



£ OPT ~ [3L d -p c + X'L e 



(45) 



as L — > oo, with 9 — —y t . 

As an aside, we point out that n(x) — Y(x) cannot go 
to zero at large positive x, but must approach 1 as p — > 1. 
We have pointed out that J7(p, L) — L d Y{p) approaches 1 
as p — > 1; now we are arguing that this difference of order 
1 exists all the way to the vicinity of p = p c , and so the in- 
tegrand in eq. (35) behaves as a 5-function when L — > oo. 
This effect is due to the "giant" percolation cluster that 
occupies a positive fraction of vertices when p > p c - If we 
start at p = 1, and decrease p, then edges are removed 
at random. Some of these removals disconnect some ver- 
tices from the giant cluster. However, the resulting value 
of L~ d dJ7(p, L)/dp has only small finite size corrections, 
of relative order e~ c L /?p° rc . The giant cluster does not 
disappear until the critical region is reached (where it 
cannot be distinguished from clusters of size £ per c — L), 
and so J7(p, L) — L d Y(p) remains close to 1 down to the 
same region. 

A useful check on the arguments is provided by the d — 
1 case, in which J7(p, L) = L(l— p)+p L , n(x)—Y(x) = e x 
(x < for d = 1). Thus 



^opt = L/2 — 1 + L~ 



(46) 



in d = 1 (higher terms are of order L 2 ), that is (3 — 1/2, 
V = 1. It is likely that A' > for all d. 

For d > d c , the use of the scaling forms with n* in 
place of n would lead to the final integral being of or- 
der L~ y t . This result is incorrect. The error is that 
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while the scaling form for Af s i n g(p, L) correctly describes 
the leading behavior as p — > p c , the integral we wish 
to calculate contains the difference of derivatives, from 
which the leading part has been subtracted. It turns out 
that there is a subleading part of J7(p, L) that dominates 
this subtracted form. Mean-field theory yields a nonana- 
lytic contribution to J7(p, L) that is precisely of the form 
L d Y s i ng (p) near p c . The leading correction to L d Y(p,L) 
due to Gaussian fluctuations at all wavevectors (within 
a field-theoretic formulation) is <~ C±L d \t\ d " poT<1 (times 
In \t\ when d is even), which is smaller than L d Y s i ng (p) as 
t — ► 0. [For comparison, for d > d c , the universal scaling 
function n* (tL y t ) comes entirely from the "zero-mode" 
fluctuations [41].] However, when L d Y(p) is subtracted, 
the leading singularity L d Y s - ln& {p) is removed, and so is 
C'±L d \t\ d " polc , but a finite-size correction to the latter re- 
mains. This finite-size correction is of the form 



W(p,L)-L d Y(p) 



E ln (<Z 2 + |t|) 



L q 



7 



d d q 
{2n) d 



HQ 2 + \t\) 



• (47) 



The ultraviolet divergence in this expression is cut off on 
the lattice; q 2 is replaced by a lattice expression that is 
periodic over the Brillouin zone (to which the sum and 
integral are restricted) , and which reduces to q 2 at small 
q. The sum is over wavevectors q = 2ir(ni, . . . , rid)/L, 
where n, are integers. Some numerical factors multiply- 
ing t have been neglected. One finds that J7(p, L) — 
L d Y(p) oc e~ L ^ 1/2 as L — ► oo. This correction is sig- 
nificant when \t\ < ]_,~ X I V ' S "> TC . For d > d c , the region 
\t\ < L -1 /^ ' is much larger than |t| > L~ Vt , within 
which the other effects are important. In the wider re- 
gion, the Gaussian fluctuations dominate, as the interac- 
tion term u is weak (and perturbation theory is infrared 
convergent for d > d c ). The contribution of the giant 
cluster also is significant over the same window. Then 
dJ7(p, L) / dp — L d dY (p) / dp possesses a scaling limit that 
is a function of tL Vt only, where y t = 2 in this case. Hence 
the rescaling argument in this case produces A'L _1//l/pero 
also. There are also other corrections for d > d c , in- 
cluding an effective finite-size shift in the value of p c , of 
order L~( d ~ 4 \ which is smaller than the width of the 
critical region \p — p c \ oc L~ Vt . This shift contributes an 
amount of order L~( d ~^ to ^opt, smaller than L e . For 
d < d c , all fluctuation effects are of similar order as the 
leading mean-field term, and have to be resummed using 
the renormalization group; they contribute to the same 
universal scaling functions n and Y, and the present ar- 
guments for d > d c do not apply there. 

The generalization to finite sizes with periodic bound- 
ary conditions, but for a cuboid of general aspect ra- 
tio (held fixed as L — ► oo) in place of the hyper- 
cube, is straightforward. Another generalization is to a 
long cylinder, of length L, and hypercubic with periodic 



boundary conditions with period W in the d — 1 trans- 
verse dimensions. In this case, the mean optimum cost 
per unit length tends to a W- (and d-) dependent limit as 
L — > oo, and by similar arguments (using methods from 
Ref. [41] for this geometry) this behaves as 



lim £opt/L 

L^oo 



pw 4 - 1 + \"w 9 



(48) 



with the same exponent 6, as W — > oo. 

Finally, the application of similar ideas to the simplest 
model MST on the complete graph with ]V = | V| vertices, 
to obtain finite- N corrections to the result of Ref. [38], 
should give (using an analysis like Ref. [41] , and similar to 
the L~ Vt = 7V~ 1//3 for d > d c that we argued is incorrect 
in the finite-d lattice case, but should be correct here) 



PT 



C(3) - 1/N + X"'N- 4 / 3 



(49) 



[we note that the percolation threshold is p c = 1/N (see 
e.g. Ref. [45]), and all terms are smaller by 1/N than in 
the lattice cases]. 



B. Effect on the mean cost of a change of boundary 
condition 



In this subsection, we consider the long-cylinder geom- 
etry, described in the introduction and at the end of sec- 
tion III A. We consider the effect of a change in boundary 
condition, that is imposed by demanding that the mini- 
mum cost spanning tree have k distinct branches crossing 
from one end to the other, instead of the one that is typi- 
cal for the usual MST. We call the minimum cost for this 
constrained spanning tree ^opt(^)- Thus, outside of the 
end regions of the cylinder, there are (at least) k trees, 
forming a spanning forest of minimum cost. This type of 
change of boundary condition could be handled by the 
Hamiltonian methods described in Ref. [41], if we had 
a direct field theory for the MST problem. This would 
lead us to expect the change in optimum cost per unit 
length to scale the same way as the finite W correction 
to the optimum, that is as W . This expectation is 
correct, but as such a formulation is not presently avail- 
able, we will turn to a different approach, which produces 
an upper bound, and which can also be applied to other 
combinatorial optimization problems. 

The idea is to begin with the MST on the long cylinder 
without the additional constraint, and now modify it so 
as to grow k — 1 additional disconnected trees that ex- 
tend from one end to the other. This must increase the 
total cost, and we estimate the resulting increase, thus 
producing an upper bound on this change. 

It is useful to give two versions of this procedure; the 
first version is simple and produces a rather conservative 
bound, while the second, more refined upper bound is 
tighter. When expressed in terms of an exponent 8, which 
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should be the same as the other 9s in this paper, the first 
says that 9 < 0, and the second that 9 < — l/v pcrc . The 
second bound presumably cannot be tightened further in 
most cases. 

We begin with some definitions for the MST on a long 
cylinder. There is a path on the tree from one end of the 
cylinder to the other, which with probability approach- 
ing 1 as L/W — > oo is unique outside the end regions (of 
length of order W). As the end regions are unimportant, 
this path is essentially unique, and we will refer to it as 
the trunk of the tree. The remainder of the tree consists 
of side-branches, which are trees rooted on the trunk; the 
side-branches presumably have linear size of order W or 
less. The basic procedure, which we describe for k — 2 as 
the generalization to k > 2 is simple, is to modify the tree 
in a sequence of steps so as to grow a second tree, dis- 
tinct from what remains of the first one except in the end 
regions, that possesses a trunk extending from one end 
to the other of the cylinder. This is done by beginning 
at one end of the cylinder, and cutting off parts of side- 
branches (by removing an edge) from the original MST 
and joining them to the new tree. Each side-branch of 
the original tree that is cut must be adjacent to the new, 
growing tree so that it can be reattached to it, by includ- 
ing an edge that was not part of the original MST. We 
end up with two disjoint trees, which together span the 
vertices, one of which has the same trunk as the original 
MST. The side-branches, and the cutting and attaching 
edges, are selected so as to minimize the increase in cost 
of the final k trees relative to the MST. 

In the first, simple procedure, at each step we look for 
a side-branch attached to the trunk of the original MST 
that is adjacent to the growing tree, and which extends 
in the growth direction. This will typically be of size W, 
and will touch the growing tree at a distance of order 
W from the trunk of the MST. The cut is made at an 
arbitrary point between the re-attachment point (which 
is also chosen arbitrarily) and the trunk. The growing 
tree thus grows by order W towards the target end. The 
number of steps required will be of order L/W , and each 
increases the cost by order one, so the change in cost is 
of order L/W, and the pair of trees constructed provides 
an upper bound on the true minimal increase in cost 
relative to the MST. For the general fc-tree version, k — 1 
additional trees can be grown in parallel, and each step 
makes progress by W/k 1 /^^ 1 ^, similar to arguments in 
Rcf. [13]. The total change in cost is then roughly of 
order kW'^L/W. 

In the second, improved version, we will recognize that 
the selection of edges to cut and to add can be optimized 
to significantly reduce the increase in cost per step. In 
fact, the edges that will be moved will again be "critical 
edges", here meaning those with cost within W e of l c . 

If the greedy algorithm is applied to any one of our 
models on the cylinder, then we can run it up to a value 
of i < £ c such that £ porc < W, say £ pcrc = W/10. If W 



is large, this means I = I c minus of order W^ 1 ^ VCTC . At 
this stage, there are many clusters of size £ pe rc, but it is 
rare for a cluster to percolate around the "circumference" 
W <C L of the cylinder (the probability in a length of or- 
der W along the cylinder is of order e~ cW ^ perc ). We will 
ignore these exceptional cases, for now, and return to this 
oversimplification later. Now we continue the greedy al- 
gorithm up to £ = l c plus of order W~ x l VveTC . There will 
now be many large clusters that have size » W along the 
long direction of the cylinder, and which together occupy 
a positive fraction of vertices as L — > oo. However, we 
cannot guarantee that there is a single giant cluster that 
percolates the full length of the cylinder with probabil- 
ity one. There is always a nonzero probability that the 
cluster is broken somewhere, even though this probabil- 
ity may be exponentially small in W . (For finite W, on 
length scales > W the problem maps onto an effectively 
one dimensional percolation problem, in which p c = 1.) 
In fact, if above l c the correlation length is £p C rc 7 then 
the probability per unit length of a break in the clus- 
ter is of order e~^ w ^ petc ^ when W > Cpcrc- This will 
not affect the argument, and we may continue as if there 
is a giant cluster and a path on the corresponding tree 
that runs from one end to the other (this path will be 
the trunk of the MST when the greedy algorithm is fin- 
ished). After giving the argument under this simplifying 
but incorrect assumption, we will return to and correct 
for this oversimplification also. 

We can choose i — £ c large enough so that there are 
actually two (or more generally, k) paths from end of 
the cylinder to the other on the giant percolation clus- 
ter, which have no edges or vertices in common with one 
another. We will assume that the paths are separated 
by of order W/2 (or W/k 1 ^^ 1 ^ for k > 2) along almost 
all of their length (again, this may be an oversimplifi- 
cation, but should not affect the scaling). Now on the 
corresponding tree (which is a subset of the edges of the 
cluster), there is only one path (or trunk) running from 
one end to the other. Take the trunk as one of the two 
disjoint paths on the cluster. The second path runs along 
the tree, but suffers many breaks at edges that are part 
of the cluster but not of the tree. The parts of the path 
that are edges on the tree lie on side-branches off the 
trunk, and typically some of the edges that connect this 
path to the trunk were not present at I = l c — W~ 1 l Vp ' ,tc . 
If we take this tree and remove one of these edges, and 
replace them with the edges on the cluster that complete 
the second path, then we have satisfied the constraint on 
the tree, and the remainder of edges of the MST can be 
added to these two trees without producing any cycles. 
Thus we have constructed two trees that together span 
the vertices, with two disjoint paths running from one 
end to the other, at an increase in cost of order W e per 
length W, with 9 = — l/f pcrc . Note that, as in the sim- 
ple version of the argument, we expect that only of order 
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one edge (i.e. a fixed number as W — > oo) must be moved 
per length W along the cylinder in order to construct the 
second path. 

The existence of breaks on the trunk of the MST when 
the algorithm stops at £ when the tree is not spanning 
does not affect the above argument (after all, we can 
easily ensure that the second path is disjoint from the 
whole trunk of the MST). The second path that is con- 
structed will also have breaks on it. These can be filled 
as I increases further. They become exponentially rare 
when W £ pC rc, so that the increase in cost for moving 
edges to construct the second trunk will converge to W 6 
per length W, as claimed. Similarly, the clusters that 
encircle one (or more) of the periodic directions of the 
cylinder when I = £ c — W~ x l v are avoided if we go to 
even smaller £. The total contribution of these events 
will converge and still scale as claimed. 

The refined version of the argument thus suggests that 
the change in cost per unit length is bounded by, and 
most likely actually of order of, W 6 ^ 1 times k- and d- 
dcpendent factors, as claimed earlier, 



lim 



^opt(&) - 4)pt 



\' k W e 



(50) 



as W — > oo. 



C. Scaling at finite size and positive temperature 

Our final topic for MSTs will be the combined effects 
of small positive temperature and finite size. We again 
assume the system is a hypercube of side L with periodic 
boundary conditions. We consider the mean free energy 
F, where F = —ThiZ, and subtract the non-singular 
part, as in the theory of finite-size scaling for critical 
phenomena at non-zero temperature discussed in section 
III A. The non-singular part takes the form L d (f3+a"T 2 + 
a[T 3 ) — £ c (there is a possibility of terms of order T 2 L° 
also). For the singular part F s i ng we have the scaling 
form 



F s ing(T, L) = TF{TL yT ) 



(51) 



for d < d c = 6. Here the exponent yr, the scaling di- 
mension of T, will turn out to be yx = —0 = 1/v. The 
factor of T occurs because (as mentioned in section II F) 
it is F/T that scales similarly to F in the nonzero tem- 
perature critical phenomena case. The scaling function 
T{x) is a function of the natural scaling combination 
x = TL 1 /", and scaling is supposed to hold as T — > 
(and L — ► oo) with x fixed. It has the limiting behavior 



T{x) oc 



r d/yT 

-1 



as x 



0. 



(52) 



These two limits reproduce the results of the previous sec- 
tions, in the two limits L — > oo with T fixed, and T — > 



with L fixed, provided j/t = — 0. We emphasize that at 
finite L, the explicit average over the disorder is required, 
as F itself is subject to fluctuations in the scaling limit. 
It should be possible to describe the statistics of the fluc- 
tuations in the singular part of F by scaling forms with 
the same exponents, also. Note that the non-singular 
part we subtracted included the non-singular subleading 
(in terms of 1/L) term — £ c , so that the scaling function 
exposes the parts with non-trivial exponents, such as dv 
or 9 in the two limits. 

For d > d c , we find some difficulty in obtaining a con- 
vincing scaling form that reproduces the limits in previ- 
ous sections. This is due to hyperscaling being violated 
in the positive temperature results (F(T, L) s i ng oc L d T 4 ), 
but not in the finite size results (F(0, L) — (3L d + £ c oc 
L~ x l v ). Possibly the problem is due to the singular part 
of the positive temperature result not being unambigu- 
ously distinguishable from the analytic behavior, as we 
have already discussed. Likewise, the finite-size contribu- 
tion at T = is due to long-range correlation effects, but 
is an integer power of L (L~ 2 ). It cannot in principle be 
distinguished from a nonsingular part of the same order. 
Even though we did not find such a terms, we did have 
to subtract a term of order L°. As we saw in the case 
of percolation, above the critical dimension there may 
be contributions to the free energy that scale in distinct 
ways. We suspect that we must write the general form 
as 

F sing (T, L) = TT\ (TL^ ) + TT 2 (TL V ' T ) (53) 

The functions in this expression have the limiting behav- 
ior 



„- , , ( 0{x d ' VT ), as x -» oo, 
.Fi(x) oc <^ \ ' 
[ x , as x — > 0. 

and ut = — 8 = 1/v = 2 for d > d c = 6, while 



Ti{x) oc 



r d /VT 



as x — > oo, 



0(x : ), as x — > 0. 



(54) 



(55) 



where y^ = yxd/d c = d/3 for d > 6. (Here, as usual, 
X = 0(Y) as Z — > oo means |A/F| is bounded as Z — > 
oo.) Each of the two previous scaling limits is reproduced 
by one of these two functions, while the other is smaller 
in that limit. In the two scaling limits of this paragraph, 
in each of which some combination of T and L is held 
fixed in the limit, one of the two functions dominates 
(and takes a limit form calculated in one of the previous 
sections) , while the other (the one that is a function of the 
combination held fixed) describes subleading corrections. 
A more complete study of this issue would be of interest. 



IV. OTHER OPTIMIZATION PROBLEMS 

In this section we consider possible extensions of the re- 
sults to other combinatorial optimization problems that 
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have a geometric flavor. 

The first one to mention is the minimum Steiner tree 
(MStT) problem [2,46]. In its Euclidean version, there 
are N "mandatory" points marked in a region A, and we 
must find a tree that visits all of them with minimum 
total Euclidean length for its edges, similar to the Eu- 
clidean MST, but now it is allowed to have vertices of 
the tree that are not mandatory. There is also a version 
on a graph, in which a subset of the vertices arc manda- 
tory, costs are assigned to the edges, and a minimum cost 
tree must be found that visits all the mandatory points. 
While the MST can be solved in a time polynomial in 
|V| (using e.g. the greedy algorithm [1]), the MStT is 
NP-hard (i.e. the decision version, asking the question 
whether there exists a Steiner tree with cost less than 
some given value, is NP-complete [47]) and presumably 
cannot be solved in polynomial time. Both optimization 
problems produce a tree that (in the random version of 
the problem) fills space on large scales (with high proba- 
bility), thus similar connectivity and boundary-condition 
properties can be defined. It is plausible that the scaling 
dimensions for the MStT are the same as for the MST, 
including 8 and v as defined in this paper. This would 
be analogous to universality arguments in statistical me- 
chanics problems such as Ising spin problems, in which 
universality classes can be distinguished on the basis of 
the locality and symmetry of the Hamiltonian and of the 
type of disorder involved. For geometric problems of the 
type considered here, there are no local order parameters 
(analogous to spins), but topological properties such as 
the connectivity we have used should take their place. 

We can consider coarse-graining methods, which we 
here describe schematically. Coarse graining, or renor- 
malization, is designed to preserve the properties that de- 
fine universality classes. If we consider the points within 
a window of size W within the sample, then the tree 
passes through its boundary at one or more points (with 
probability approaching 1 as W increases) . Only the fact 
that each of these is or is not connected through the in- 
terior of the window to each other such point (for the 
given window) is relevant to the tree outside. Thus min- 
imization of the cost over the interior can be performed 
for each such boundary condition. If the system is par- 
titioned into such windows of equal size, then patching 
together the windows subsequently, one can minimize the 
total cost in stages that are performed locally, at the cost 
of storing a large amount of information about the re- 
sults for different boundary conditions. The information 
that needs to be stored is reduced by coarse-graining, 
that is assuming that fine details of the structure will 
not be important. In particular, in low dimensions (less 
than eight [4]) there will typically be only a finite num- 
ber of large (size of order W) trees visible within each 
window, even for large windows. The reduced objects 
can be represented as trees, but with a lower density 
of vertices. These are the usual ideas of the renormal- 



ization group, applied to geometric objects. In general, 
the cost for given connections within a window will de- 
pend on the connections in a complicated way, and can- 
not be expressed simply as a sum over some "occupied" 
edges. One property that should be maintained as coarse- 
graining proceeds is that if two disconnected portions are 
connected, the cost will increase. Thus, the simple form 
of the cost for the MST, and the less simple (in terms 
of the mandatory vertices) form for the MStT, are just 
two examples, and all models will become more com- 
plicated under coarse-graining anyway. It is then likely 
that the universality classes (one for each dimension d) 
in which all the (short-range correlated, <i-dimensional) 
MST problems lie are actually larger and contain some 
more general tree-optimization problems. Hence it is not 
at all implausible that the MST and MStT arc in the 
same universality class for each d. 

There are also other popular problems, such as 
the traveling salesman problem (TSP), and minimum 
weighted matching. The scaling forms for various quan- 
tities given in previous sections should also apply to 
these (in their c?-dimensional version), though the uni- 
versal numbers, including the exponents and critical di- 
mensions, may be different. For the TSP, we can de- 
fine 8 from the finite-size correction to the total cost, 
say for periodic boundary conditions on a hypercube, as 
7^ = /3L d + X'L e + .... For the TSP, Rhee [48] raised 
the question (for d = 2) of whether for periodic bound- 
ary conditions, in our notation, (^opt — PL 2 )/ L — > as 
L — > oo. Our answer to this question would be affirma- 
tive. We note that the order one term in ^opt for MST 
with periodic boundary conditions can be traced back to 
the fact that t is a sum of |V| — 1 = L d — 1 terms, not 
|V| = L d . For the TSP, I is a sum of exactly |V| terms. 
Alternatively, we can define 8 by considering the change 
in cost when the tour is required to travel from one end 
of a cylinder to the other k times, as in Ref. [11]. 

For the TSP at nonzero temperature, no phase tran- 
sition is found in mean- field theory [49], and so we ex- 
pect none in any dimension d. The high-temperature 
limit of TSP is a sum over all tours of the graph, so 
could be called "uniform Hamiltonian cycles" , but this 
is also essentially what is called dense polymers (self- 
avoiding walks constricted in volume). However, we 
should caution that uniform Hamiltonian cycles on some 
two-dimensional lattices are known to be in different uni- 
versality classes from the more generic dense polymers; 
these are called fully-packed loop models. In dense poly- 
mers, weak disorder is an irrelevant perturbation, so it 
is reasonable to imagine that the renormalization group 
can flow to the high-temperature fixed point. Given the 
absence of a transition at finite non-zero T, we expect 
that any positive temperature is relevant, and so that 
8 < 0. Assuming that 8 < 0, there will be a crossover 
length £ oc T~ v that diverges as T — > 0, with again the 
scaling relation v = —1/8. We can also try to bound 8 
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as in section IIIB. In the absence of detailed informa- 
tion, we can still use an argument similar to the simple 
bound given there. In particular, in two dimensions, the 
tour is equivalent to the boundary of a tree, so that the 
argument is really the same, and we conclude again that 
9 < 0. In Rcf. [11], it was assumed that 9 = for d = 2, 
and some support for this was found numerically. 

More speculatively, since the two-dimensional TSP is 
equivalent to minimizing a complicated but local cost 
function for a tree, the type of coarse-graining arguments 
outlined above suggest that 9rsp(d = 2) = ^mst(^ = 
2) = —3/4 (and that other corresponding exponents also 
are equal, as suggested in Ref. [11]). Even if this sugges- 
tion is correct, the universality classes for TSP in dimen- 
sions d > 2 do not have to join smoothly with the MST 
class at d = 2. There are actually (at least) two probabil- 
ity measures for space-filling curves (or dense polymers) 
in d = 2, depending on whether they are strictly non- 
intersecting, or self-intersections are discouraged but not 
forbidden [51]. Whether or not TSP is in the same uni- 
versality class as dense polymers in any dimension, or 
for any subset of its properties, a similar topological dis- 
tinction probably holds for TSP [11]. A two-dimensional 
version of the TSP that allows the curves to cross can 
be obtained using a tour in a three-dimensional slab of 
small thickness in one direction, that is large in the two 
orthogonal directions. On large scales, this problem is 
effectively two dimensional, and the optimum tour pro- 
jected into these two dimensions will intersect itself. Such 
problems will define a distinct universality class of TSPs 
from the usual planar (non-self-intersecting) one. It will 
be the natural continuation of the TSP universality class 
for d > 2 to d = 2, as in the case of dense polymers 
[51,11]. The suggestion in Ref. [11] that the TSP is in the 
universality class of dense polymers for d > 2 (where the 
d = 2 case means the version with intersections) implies 
that the critical dimension is 2, at least for the geomet- 
ric correlation properties (that is, d = 2 is analogous to 
d = 8 for MSTs [4]). It would be interesting to use the 
mean-field approach [49] in finite dimensions to calculate 
a mean-field value of 9 for the TSP for sufficiently high 
d, and to find the value of d c for the TSP. 

In an interesting paper, Moore [10] applied the idea 
of 9 (which he called y) to combinatorial optimization 
problems. He argued that for the TSP, 9=1 for all 
dimensions d. His argument was based on the analysis of 
the relative error in a partitioning algorithm by Karp [50] . 
Inspection of this analysis shows that the relative error is 
related to the first boundary term in an expansion for a 
hypercube with free boundaries, £<jpt ~ /3L d + fi\L d ~ x + 
... (/?i has been shown to be positive [48]). If a large 
system is partitioned into such cubes, which are solved 
separately, then there will be errors of this form for each 
cube [50], which would be absent in a better scheme. 
Further, as we have seen, the boundary terms for the 
whole system do not scale with exponent 9. Accordingly, 



we do not believe that this is a valid determination of the 
value of 9 for the TSP. 

A perfect matching is by definition a subgraph of G 
that includes all the vertices of G, such that every ver- 
tex is on exactly one edge of the matching. In mini- 
mum weighted matching, one must find a perfect match- 
ing such the cost, which is the sum of the costs of the 
"occupied" edges (those on the matching), is minimized 
[1,2,46]. The case in which G is bipartite (there are two 
sets of vertices U and V, with \U\ = \V\ = N, and only 
edges that connect a member of U to one of V) is a lit- 
tle easier to solve, and is also known as the assignment 
problem. The Euclidean bipartite minimum matching 
problem (which is also known as two-sample matching), 
in which the vertices in U U V are distributed, for ex- 
ample, independently and uniformly over a domain such 
as [0,L] d (with N/L d = 1) has the curious property (as 
quoted in Ref. [2]) that, at least for two dimensions, the 
mean optimum cost is of order L 2 (lnL) 1 / 2 . This is not 
the case for the unrestricted (non-bipartite) Euclidean 
problem [2]. Minimum weight matching occurs (though 
not with Euclidean distance as the cost) in finding the 
ground state of an Ising spin glass in two dimensions, 
with free boundary conditions, and also in other physical 
problems. Leaving aside cases like the two-dimensional 
Euclidean bipartite one that may require special treat- 
ment, we again argue that 9 < 0, on the basis of the ab- 
sence of a transition in mean- field theory [52]. There is a 
similar picture of positive temperature causing a flow to 
the "uniform matchings" problem, also known as "dimer 
packing" ; in this, the high temperature limit of the par- 
tition function, the sum is over all matchings with equal 
Boltzmann-Gibbs weight. 

It should not be imagined that 9 < in all combina- 
torial optimization problems, even in those that can be 
solved in polynomial computation time. The shortest- 
path problem (for two given vertices separated by dis- 
tance L, find the path between them of lowest total cost, 
where non-negative random costs are assigned indepen- 
dently to each edge of a lattice) is equivalent to the di- 
rected polymer problem (see especially Ref. [53]). The 
variations in the cost of the optimal path scale as L 6 , 
with 9 > for all dimensions d > 1, and 9 = 1/3 in 
two dimensions. If the cost is viewed as "time" , then 
shortest-path becomes first-passage percolation. Other 
generalizations of shortest-path have been considered in 
statistical physics, including that in which the directed 
path is replaced by a (i-dimensional surface (e.g. a do- 
main wall) in d + 1-dimensional space, each point of the 
surface has a unique projection to the 2^+1 = coordi- 
nate hyperplane, and the cost (or energy) is the sum of 
random costs assigned to faces of the lattice occupied by 
the surface [54]. We will assume here that the projection 
of the surface to x^+i — is a <i-dimcnsional hypercube 
of side L, and that the boundary of the surface is fixed 
in the Xd+i = hyperplane. For d > d c = 4, 9 takes 
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on the mean- field-like value d — 2, when —9 is denned 
as the scaling dimension for temperature [54]. However, 
the leading finite-size correction to (iL d in the mean op- 
timum cost (ground state energy) involves the disorder, 
which is irrelevant for d > 4, and hence the correction 
term is X 1 L d ~ 2 / L d ~ 4 = XL 2 . 

V. CONCLUSION 

The central results of this paper concern the behav- 
ior of the correlation length £ oc T~ v as T — > 0, and 
the finite size correction to the optimum cost cx L e , with 
the scaling relation v = —1/9. We find that v — v VCXCl 
the correlation length exponent in classical percolation, 
for all dimensions d. This result rests on the identifica- 
tion of the "critical edges" that have cost close to the 
percolation threshold, as these edges connect the tree 
over large scales, and can be replaced by one another 
at low change in cost (of order T or L~ x l v per edge for 
the positive temperature, and finite size situations, re- 
spectively). Although it is sometimes said that there is 
no phase transition behavior in optimization, the results 
presented here can be understood as a transition occur- 
ring right at T = 0. 

We used Kruskal's greedy algorithm in many of the ar- 
guments, but the results we obtain are about the MST 
(or near optimal, thermally excited trees), and do not 
depend on the algorithm used. Thus this is not an "anal- 
ysis of an algorithm" in a traditional sense. There may 
still be more to be learned by using other algorithms. 
It would be interesting to analyze other problems that 
possess polynomial-time algorithms (notably, minimum 
matching) in a similar manner. 

The discussion of universality classes, and our sugges- 
tions (see also Ref. [11]) that minimum spanning tree, 
minimum Stciner tree, and even two-dimensional travel- 
ing salesman problem may be in the same universality 
class, serves to illustrate that the universal scaling prop- 
erties discussed in this paper may have very little to do 
with the computational complexity issues of P versus NP 
[47] , which seem to depend entirely on details of the defi- 
nition of the optimization problem at short length scales 
(some related observations are made in Ref. [55]). Possi- 
bly this is due to the difference between the average-case 
behavior that is analyzed here and related to universal- 
ity classes, and the worst-case computational complexity 
characterized by P or NP. On the other hand, the scaling 
properties may be very useful for understanding the ef- 
fectiveness of algorithmic techniques (such as local search 
and randomized algorithms) and approximation schemes, 
when they are applied to hard random problems in d di- 
mensions. 
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